19.1 Inference as Optimization¶
Exact inference can be described as an optimization problem.
- Goal: compute \(\log p(v; \theta)\)
- Challenge: costly to marginalize out h
- Solution: Compute the lower bound of \(\log p(v; \theta)\), aka evidence lower bound (ELBO).
Introduce q as an arbitrary distribution over h
\[\begin{split}\begin {equation}
\begin{split}
KL[q(h|v) || p(h|v)] &= \sum_h q(h|v) \log \frac{q(h|v)}{p(h|v)} \\
&= \sum_h q(h|v) \log \frac{q(h|v)}{\frac {p(x, v)}{p(v)}} \\
&= \sum_h q(h|v) \log p(v) + \sum_h q(h|v) \log q(h) - \sum_h q(h|v) \log p(v, h) \\
&= \log p(v) - H(q(h|v)) - E_{h\sim q(h|v)}[\log p(v, h)]
\end{split}
\end {equation}\end{split}\]
Now we have
\[\log p(v) = E_{h\sim q(h|v)}[\log p(v, h)] + KL[q(h|v) || p(h|v)] + H(q)\]
Because KL divergence is always nonnegative we have lower bound
\[L(v, \theta, h) = E_{h\sim q(v|h)}[\log p(v, h)] + H(q(v|h))\]
For appropriate of q, L is tractable to compute. For q(h|v)) that are better approaximation of p(h|v), the lower bound would be tighter, meaning closer to log p(v). When q(h|v) = p(v|h), the approaximation is perfect, \(L(v, \theta, q) = log P(v; \theta)\).
We can think of inference as the procedure for finding the q that maximizes L. Solutions
- Exact inference maximizes L perfectly by searching over a family of functions q that includes p(h|v). (Brute Force)
- Approximate by restricting the family of distribution q that the optimization is allowed to search over
- Approxiamte by using imperfect optimization procedure that may not completely maximize L but may only increase it by significant amount.
Review on Shannon entropy
We can quantify the amount of uncertainty in an entire probability distribution using the Shannon entropy:
\[H(x) = E_{x \sim P}(I(x)) = - E_{x \sim P}(\log P(x))\]